Data Visualization

Introduction

Data visualization is one of the main steps on the way to understanding a dataset. General information on data visualization (beyond Python) can be found in the following list:

https://extremepresentation.typepad.com/files/chart-chooser-2020.pdf.

A major difference in the visualization solutions relies on the possibility of performing interactive inspection; otherwise, the solution is said static.

Interactive tools for data visualization are emerging in Python with plotly, altair, Bokeh, etc. An extensive study by Aarron Geller provides the pros and cons of each method.

Python

The list is long (and growing) of Python packages for data visualization. We provide some examples in the pandas section of the website, and also in the Scipy course.

matplotlib: Visualization with Python

Source: https://matplotlib.org/.

This is the standard library for plots in Python. The documentation is well written and matplotlib should be the default choice for creating static documents (e.g., .pdf or .doc files).

Usual loading commands:

import matplotlib.pyplot as plt

Example::

import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 2 * np.pi, 1024)
ft1 = np.sin(2 * np.pi * t)
ft2 = np.cos(2 * np.pi * t)
fig, ax = plt.subplots()
ax.plot(ft1, label='sin')
ax.plot(ft2, label='cos')
ax.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7f46f6b8d330>

seaborn: statistical data visualization

Source: https://seaborn.pydata.org/.

seaborn is built over matplotlib and is specifically tailored for data visualization (maptlotlib is a more flexible and general tool). Default settings are usually nicer than one from maptlotlib, especially for standard tools (histograms, KDE, swarmplots, etc.).

Usual loading commands:

import seaborn as sns

Example::

import seaborn as sns
import pandas as pd
df = pd.DataFrame(dict(sin=ft1, cos=ft2))
sns.set_style("whitegrid")
ax = sns.lineplot(data=df)
sns.move_legend(ax, "lower right")
sns.despine()

plotly: a graphing library for Python

Source: https://plotly.com/python/.

The force of plotly is that it is interactive and can handle R software or julia on top of Python (it relies on Java Script under the hood).

Usual loading commands:

import plotly

Alternatively, you can also use plotly.express to use predefined figures:

import plotly.express as px
import plotly.express as px
fig = px.line(df)
fig.show()
Note

In plotly the figure is interactive. If you click on the legend on the right, you can select a curve to activate/deactivate.

But now you can also create a slider to change a parameter, for instance showing the functions

\begin{align*} f_w: t \to \sin(2 \cdot \pi \cdot w \cdot t)\\ g_w: t \to \sin(2 \cdot \pi \cdot w \cdot t) \end{align*} for w \in [-5, 5]

# inspiration from:
# https://community.plotly.com/t/multiple-traces-with-a-single-slider-in-plotly/16356
import plotly.graph_objects as go
import numpy as np

n_step = 101
# Create figure
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot

num_steps = 101
slider_range = np.linspace(-5, 5 , num=num_steps)
trace_list1 = []
trace_list2 = []

for i, w  in enumerate(slider_range):
    trace_list1.append(go.Scatter(y=np.sin(2*np.pi*t*w), visible=False, line={'color': 'red'}, name=f"sin({w:.2f} * 2 *pi)"))

    trace_list2.append(go.Scatter(y=np.cos(2*np.pi*t *w), visible=False, line={'color': 'blue'}, name=f"cos({w:.2f} * 2 *pi)"))

fig = go.Figure(data=trace_list1+trace_list2)

# Initialize display:
fig.data[51].visible = True
fig.data[51 + n_step].visible = True


steps = []
for i in range(num_steps):
    # Hide all traces
    step = dict(
        method = 'restyle',
        args = ['visible', [False] * len(fig.data)],
    )
    # Enable the two traces we want to see
    step['args'][1][i] = True
    step['args'][1][i+num_steps] = True

    # Add step to step list
    steps.append(step)

sliders = [dict(
    active = 50,
    currentvalue={"prefix": "w= "},
    steps = steps,
)]

fig.layout.sliders = sliders

iplot(fig, show_link=False)

bokeh: interactive visualizations in the browsers

Source: http://bokeh.org/.

XXX: TODO

Usual loading commands:

import bokeh

vega-altair: declarative visualization in Python

Source: https://altair-viz.github.io/.

Usual loading commands:

import altair as alt

XXX: TODO

pygal: python charting

Source: https://www.pygal.org/.

We use it mostly for maps, and especially for the map of France with DOMs. For instance, see the course Creating a Python module, such a map is constructed:

import pygal

To access the French map plugin you need the installation step that follows:

pip install pygal_maps_fr

Shiny: interactive web applications

Source: https://shiny.posit.co/py/.

Shiny helps you to customize the layout and style of your application and dynamically respond to events, such as a button press, or dropdown selection. It can then be interfaced easily with Quarto to render the app on your website, using a {shinylive-python} cell; see an example at https://quarto-ext.github.io/shinylive/.

#| echo: true
#| standalone: true
#| viewerHeight: 630

import matplotlib.pyplot as plt
import numpy as np
from shiny import ui, render, App

# Create some random data
t = np.linspace(0, 2 * np.pi, 1024)
num_steps = 101
slider_range = np.linspace(-5, 5 , num=num_steps)


app_ui = ui.page_fixed(
    ui.h2("Playing with sliders"),
    ui.layout_sidebar(
        ui.panel_sidebar(
            ui.input_slider("w", "Frequency", -5, 5, value=1, step=slider_range[-1]-slider_range[-2]),
        ),
        ui.panel_main(
            ui.output_plot("plot")
        )
    )
)

def server(input, output, session):
    @output
    @render.plot
    def plot():
        fig, ax = plt.subplots()
        ax.plot(t, np.sin(2*np.pi *input.w() * t), label='sin')
        ax.plot(t, np.cos(2*np.pi *input.w() * t), label='cos')

        return fig


app = App(app_ui, server)

R software

XXX TODO.

Java Script

XXX TODO. Out of the scope of this course, yet more powerful.

D3JS

Observable

This is of interest as Quarto can directly read such kinds of figures.

Back to top